151 research outputs found

    The geography of recent genetic ancestry across Europe

    Get PDF
    The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.Comment: Full size figures available from http://www.eve.ucdavis.edu/~plralph/research.html; or html version at http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm

    Routes for breaching and protecting genetic privacy

    Full text link
    We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

    A Minimum Column Density of 1 g cm^-2 for Massive Star Formation

    Full text link
    Massive stars are very rare, but their extreme luminosities make them both the only type of young star we can observe in distant galaxies and the dominant energy sources in the universe today. They form rarely because efficient radiative cooling keeps most star-forming gas clouds close to isothermal as they collapse, and this favors fragmentation into stars <~1 Msun. Heating of a cloud by accreting low-mass stars within it can prevent fragmentation and allow formation of massive stars, but what properties a cloud must have to form massive stars, and thus where massive stars form in a galaxy, has not yet been determined. Here we show that only clouds with column densities >~ 1 g cm^-2 can avoid fragmentation and form massive stars. This threshold, and the environmental variation of the stellar initial mass function (IMF) that it implies, naturally explain the characteristic column densities of massive star clusters and the difference between the radial profiles of Halpha and UV emission in galactic disks. The existence of a threshold also implies that there should be detectable variations in the IMF with environment within the Galaxy and in the characteristic column densities of massive star clusters between galaxies, and that star formation rates in some galactic environments may have been systematically underestimated.Comment: Accepted for publication in Nature; Nature manuscript style; main text: 14 pages, 3 figures; supplementary text: 8 pages, 1 figur

    Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

    Get PDF
    Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies

    Evolutionary Dynamics of Co-Segregating Gene Clusters Associated with Complex Diseases

    Get PDF
    BACKGROUND: The distribution of human disease-associated mutations is not random across the human genome. Despite the fact that natural selection continually removes disease-associated mutations, an enrichment of these variants can be observed in regions of low recombination. There are a number of mechanisms by which such a clustering could occur, including genetic perturbations or demographic effects within different populations. Recent genome-wide association studies (GWAS) suggest that single nucleotide polymorphisms (SNPs) associated with complex disease traits are not randomly distributed throughout the genome, but tend to cluster in regions of low recombination. PRINCIPAL FINDINGS: Here we investigated whether deleterious mutations have accumulated in regions of low recombination due to the impact of recent positive selection and genetic hitchhiking. Using publicly available data on common complex diseases and population demography, we observed an enrichment of hitchhiked disease associations in conserved gene clusters subject to selection pressure. Evolutionary analysis revealed that these conserved gene clusters arose by multiple concerted rearrangements events across the vertebrate lineage. We observed distinct clustering of disease-associated SNPs in evolutionary rearranged regions of low recombination and high gene density, which harbor genes involved in immunity, that is, the interleukin cluster on 5q31 or RhoA on 3p21. CONCLUSIONS: Our results suggest that multiple lineage specific rearrangements led to a physical clustering of functionally related and linked genes exhibiting an enrichment of susceptibility loci for complex traits. This implies that besides recent evolutionary adaptations other evolutionary dynamics have played a role in the formation of linked gene clusters associated with complex disease traits

    Susceptibility of Anopheles stephensi to Plasmodium gallinaceum: A Trait of the Mosquito, the Parasite, and the Environment

    Get PDF
    Vector susceptibility to Plasmodium infection is treated primarily as a vector trait, although it is a composite trait expressing the joint occurrence of the parasite and the vector with genetic contributions of both. A comprehensive approach to assess the specific contribution of genetic and environmental variation on "vector susceptibility" is lacking. Here we developed and implemented a simple scheme to assess the specific contributions of the vector, the parasite, and the environment to "vector susceptibility." To the best of our knowledge this is the first study that employs such an approach.We conducted selection experiments on the vector (while holding the parasite "constant") and on the parasite (while holding the vector "constant") to estimate the genetic contributions of the mosquito and the parasite to the susceptibility of Anopheles stephensi to Plasmodium gallinaceum. We separately estimated the realized heritability of (i) susceptibility to parasite infection by the mosquito vector and (ii) parasite compatibility (transmissibility) with the vector while controlling the other. The heritabilities of vector and the parasite were higher for the prevalence, i.e., fraction of infected mosquitoes, than the corresponding heritabilities of parasite load, i.e., the number of oocysts per mosquito.The vector's genetics (heritability) comprised 67% of "vector susceptibility" measured by the prevalence of mosquitoes infected with P. gallinaceum oocysts, whereas the specific contribution of parasite genetics (heritability) to this trait was only 5%. Our parasite source might possess minimal genetic diversity, which could explain its low heritability (and the high value of the vector). Notably, the environment contributed 28%. These estimates are relevant only to the particular system under study, but this experimental design could be useful for other parasite-host systems. The prospects and limitations of the genetic manipulation of vector populations to render the vector resistant to the parasite are better considered on the basis of this framework

    Automation of a problem list using natural language processing

    Get PDF
    BACKGROUND: The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. METHODS: For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. RESULTS: The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. CONCLUSION: The global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information

    Evidence for Hitchhiking of Deleterious Mutations within the Human Genome

    Get PDF
    Deleterious mutations present a significant obstacle to adaptive evolution. Deleterious mutations can inhibit the spread of linked adaptive mutations through a population; conversely, adaptive substitutions can increase the frequency of linked deleterious mutations and even result in their fixation. To assess the impact of adaptive mutations on linked deleterious mutations, we examined the distribution of deleterious and neutral amino acid polymorphism in the human genome. Within genomic regions that show evidence of recent hitchhiking, we find fewer neutral but a similar number of deleterious SNPs compared to other genomic regions. The higher ratio of deleterious to neutral SNPs is consistent with simulated hitchhiking events and implies that positive selection eliminates some deleterious alleles and increases the frequency of others. The distribution of disease-associated alleles is also altered in hitchhiking regions. Disease alleles within hitchhiking regions have been associated with auto-immune disorders, metabolic diseases, cancers, and mental disorders. Our results suggest that positive selection has had a significant impact on deleterious polymorphism and may be partly responsible for the high frequency of certain human disease alleles

    Patterns of Ancestry, Signatures of Natural Selection, and Genetic Association with Stature in Western African Pygmies

    Get PDF
    African Pygmy groups show a distinctive pattern of phenotypic variation, including short stature, which is thought to reflect past adaptation to a tropical environment. Here, we analyze Illumina 1M SNP array data in three Western Pygmy populations from Cameroon and three neighboring Bantu-speaking agricultural populations with whom they have admixed. We infer genome-wide ancestry, scan for signals of positive selection, and perform targeted genetic association with measured height variation. We identify multiple regions throughout the genome that may have played a role in adaptive evolution, many of which contain loci with roles in growth hormone, insulin, and insulin-like growth factor signaling pathways, as well as immunity and neuroendocrine signaling involved in reproduction and metabolism. The most striking results are found on chromosome 3, which harbors a cluster of selection and association signals between approximately 45 and 60 Mb. This region also includes the positional candidate genes DOCK3, which is known to be associated with height variation in Europeans, and CISH, a negative regulator of cytokine signaling known to inhibit growth hormone-stimulated STAT5 signaling. Finally, pathway analysis for genes near the strongest signals of association with height indicates enrichment for loci involved in insulin and insulin-like growth factor signaling

    Dark Energy Survey Year 3 results: Constraints on extensions to ΛcDM with weak lensing and galaxy clustering

    Get PDF
    We constrain six possible extensions to the Λ cold dark matter (CDM) model using measurements from the Dark Energy Survey's first three years of observations, alone and in combination with external cosmological probes. The DES data are the two-point correlation functions of weak gravitational lensing, galaxy clustering, and their cross-correlation. We use simulated data vectors and blind analyses of real data to validate the robustness of our results to astrophysical and modeling systematic errors. In many cases, constraining power is limited by the absence of theoretical predictions beyond the linear regime that are reliable at our required precision. The ΛCDM extensions are dark energy with a time-dependent equation of state, nonzero spatial curvature, additional relativistic degrees of freedom, sterile neutrinos with eV-scale mass, modifications of gravitational physics, and a binned σ8(z) model which serves as a phenomenological probe of structure growth. For the time-varying dark energy equation of state evaluated at the pivot redshift we find (wp,wa)=(-0.99-0.17+0.28,-0.9±1.2) at 68% confidence with zp=0.24 from the DES measurements alone, and (wp,wa)=(-1.03-0.03+0.04,-0.4-0.3+0.4) with zp=0.21 for the combination of all data considered. Curvature constraints of ωk=0.0009±0.0017 and effective relativistic species Neff=3.10-0.16+0.15 are dominated by external data, though adding DES information to external low-redshift probes tightens the ωk constraints that can be made without cosmic microwave background observables by 20%. For massive sterile neutrinos, DES combined with external data improves the upper bound on the mass meff by a factor of 3 compared to previous analyses, giving 95% limits of (ΔNeff,meff)≤(0.28,0.20 eV) when using priors matching a comparable Planck analysis. For modified gravity, we constrain changes to the lensing and Poisson equations controlled by functions ς(k,z)=ς0ωΛ(z)/ωΛ,0 and μ(k,z)=μ0ωΛ(z)/ωΛ,0, respectively, to ς0=0.6-0.5+0.4 from DES alone and (ς0,μ0)=(0.04±0.05,0.08-0.19+0.21) for the combination of all data, both at 68% confidence. Overall, we find no significant evidence for physics beyond ΛCDM
    • …
    corecore